Model Selection

Multimodal CLIP Fusion

# Multimodal CLIP Fusion

Kandinsky 2.1 is a text-to-image generation model based on Dall-E 2 and best practices of latent diffusion models, combining CLIP encoder with innovative diffusion image prior techniques

kandinsky-community

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase